Scalable kernel-based variable selection with sparsistency

نویسندگان

Xin He

Junhui Wang

Shaogao Lv

چکیده

Variable selection is central to high-dimensional data analysis, and various algorithms have been developed. Ideally, a variable selection algorithm shall be flexible, scalable, and with theoretical guarantee, yet most existing algorithms cannot attain these properties at the same time. In this article, a three-step variable selection algorithm is developed, involving kernel-based estimation of the regression function and its gradient functions as well as a hard thresholding. Its key advantage is that it assumes no explicit model assumption, admits general predictor effects, allows for scalable computation, and attains desirable asymptotic sparsistency. The proposed algorithm can be adapted to any reproducing kernel Hilbert space (RKHS) with different kernel functions, and can be extended to interaction selection with slight modification. Its computational cost is only linear in the data dimension, and can be further improved through parallel computing. The sparsistency of the proposed algorithm is established for general RKHS under mild conditions, including linear and Gaussian kernels as special cases. Its effectiveness is also supported by a variety of simulated and real examples.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Bayesian Kernel Models with Variable Selection

Nonlinear kernels are used extensively in regression models in statistics and machine learning since they often improve predictive accuracy. Variable selection is a challenge in the context of kernel based regression models. In linear regression the concept of an effect size for the regression coefficients is very useful for variable selection. In this paper we provide an analog for the effect ...

متن کامل

High-Dimensional Gaussian Graphical Model Selection: Tractable Graph Families

We consider the problem of high-dimensional Gaussian graphical model selection. We identify a set of graphs for which an efficient estimation algorithm exists, and this algorithm is based on thresholding of empirical conditional covariances. Under a set of transparent conditions, we establish structural consistency (or sparsistency) for the proposed algorithm, when the number of samples n = ω(J...

متن کامل

Scalable Kernel Embedding of Latent Variable Models∗

Kernel embedding of distributions maps distributions to the reproducing kernel Hilbert space (RKHS) of a kernel function, such that subsequent manipulations of distributions can be achieved via RKHS distances, linear and multilinear transformations, and spectral analysis. This framework has led to simple and effective nonparametric algorithms in various machine learning problems, such as featur...

متن کامل

High-dimensional Gaussian graphical model selection: walk summability and local separation criterion

متن کامل

Asymptotic distribution and sparsistency for `1 penalized parametric M-estimators, with applications to linear SVM and logistic regression

Since its early use in least squares regression problems, the `1-penalization framework for variable selection has been employed in conjunction with a wide range of loss functions encompassing regression, classification and survival analysis. While a well developed theory exists for the `1-penalized least squares estimates, few results concern the behavior of `1-penalized estimates for general ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1802.09246 شماره

صفحات -

تاریخ انتشار 2018

Scalable kernel-based variable selection with sparsistency

نویسندگان

چکیده

منابع مشابه

Scalable Bayesian Kernel Models with Variable Selection

High-Dimensional Gaussian Graphical Model Selection: Tractable Graph Families

Scalable Kernel Embedding of Latent Variable Models∗

High-dimensional Gaussian graphical model selection: walk summability and local separation criterion

Asymptotic distribution and sparsistency for `1 penalized parametric M-estimators, with applications to linear SVM and logistic regression

عنوان ژورنال:

اشتراک گذاری